To recap, in Assignment 1, I (1) analysed information about the dataset I have chosen, and its associated publication, (2) mapped the expression data to HUGO gene symbols, (3) filtered out unncecessary genes, and (4) normalized the dataset. The following subsections go over these points in a bit more detail, and are taken from my Assignment 1. In addition, the code from my .rmd file includes code from A1- in order to regenerate the normalized counts matrix.
The expression dataset that I had chosen is GSE221253, taken from the GEO expression data repository. The study associated with this dataset involved adoptive cell therapy. Adoptive cell therapy via tumor infiltrating lymphocytes (ACT-TIL) is a type of immunotherapy used to treat cancer. This study took samples from 13 patients having metastatic melanoma and performed RNAseq analysis, along with other analyses like spatial proteomics and scRNAseq, on tumor tissues before and after the cell therapy (pre- and post-ACT) in order to gain a better understanding of the interactions and cell states within the tumor microenvironment throughout treatment. For the RNAseq experiment, this resulted in a total of 26 samples, and 13 in each condition (two for each patient, and the experimental conditions being pre- and post-ACT treatment).
The first step from Assignment 1 was to map the expression data to HUGO gene symbols. Only 940 out of 19117 genes from our original gene expression raw file were unable to be mapped to HUGO gene symbols. This is actually not that bad!
The next step in Assignment 1 was to filter out the genes with low counts and normalize our data. Filteration of outlier genes with low gene counts removed ~3678 genes from the expression dataset. Then, TMM normalization was performed to normalize the data. The plots (pre and post normalization + filtration) are shown below:
To recap, the first step of Assignment 2 is to perform differential gene expression, in order to identify key genes that are differentially expressed between our subgroups. This involves (1) constructing our model design, (2) estimating the dispersion, (3) performing multiple hypothesis testing and using the Quasi likelihood model to calculate differential expression, and (4) generate a heatmap and MA plot of the data. Next, we performed threshold over-representation analysis
First, this involved defining our model design, and inspecting the MDS plot (below) to see if there was any clustering between our two groups (pre-ACT and post-ACT). It doesn’t seem that there is a lot of clustering amongst the pre-ACT and post-ACT samples. However, the study associated with our data set aims to understand interactions and cell states within the tumor microenvironment throughout treatment, so it makes the most sense to define the group for our model design as pre- and post-ACT treatment.
Here, we estimate the dispersion, fit the model to our specified model design, and then calculate differential expression using the Quasi liklihood model. The table below shows the top gene hits.
|
|
|
|
In the below code blocks, we compile all of the results, and find how many genes pass the threshold value, and how many genes pass correction.
How many genes pass the p-value threshold p < 0.05?
## [1] 4684
Now, let us see how many genes pass correction for multiple hypothesis testing. ~25% of the genes pass correction. This is a decent number, as it is not too much or not too little genes.
## [1] 1643
Here is an MA plot that displays the differential gene expression
between our two groups of samples: pre- and post- ACT.
The following codeblock generates a heatmap of our data, along with
annotations, to see how the data seems to cluster. Based on what is
shown in the heatmap below, there does seem to be some minimal
clustering of genes that are overexpressed in some pre-ACT samples.
The last step of this assignment is to perform thresholded over-representation analysis (ORA). Essentially, our main goal here is to see whether our upregulated or downregulated genes all share some kind of common characteristic (maybe some of them are from the same pathway, etc). By doing this, we can then make conclusions about what kind of genes are overexpressed/underexpressed and ensure that it aligns with literature.
Here, we define our upregulated and downregulated genes. Upregulated genes are defined here to have a p-value of 0.05 and positive fold change. Downregulated genes are defined here to have a p-value of 0.05 and negative fold change.
## [1] "Number of upregulated genes"
## [1] 1170
## [1] "Number of downregulated genes"
## [1] 473
## [1] "Number of significant genes in total"
## [1] 1643
Here, we run run g:profiler on the set of all significant genes that have a p-value of < 0.05 to perform a gene set enrichment analysis.
| query | significant | p_value | term_size | query_size | intersection_size | precision | recall | term_id | source | term_name | effective_domain_size | source_order | parents |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| query_1 | TRUE | 6.630947e-07 | 1494 | 1393 | 199 | 0.14285714 | 0.1331995 | GO:0044281 | GO:BP | small molecule metabolic process | 16178 | 11317 | GO:0008152 |
| query_1 | TRUE | 2.189967e-05 | 2145 | 1393 | 257 | 0.18449390 | 0.1198135 | GO:0009056 | GO:BP | catabolic process | 16178 | 3269 | GO:0008152 |
| query_1 | TRUE | 5.598736e-05 | 4941 | 1393 | 517 | 0.37114142 | 0.1046347 | GO:1901564 | GO:BP | organonitrogen compound metabolic process | 16178 | 22098 | GO:00068…. |
| query_1 | TRUE | 7.868727e-05 | 267 | 1393 | 51 | 0.03661163 | 0.1910112 | GO:0016053 | GO:BP | organic acid biosynthetic process | 16178 | 5145 | GO:00060…. |
| query_1 | TRUE | 9.388388e-05 | 764 | 1393 | 109 | 0.07824838 | 0.1426702 | GO:0006082 | GO:BP | organic acid metabolic process | 16178 | 1967 | GO:00442…. |
| query_1 | TRUE | 9.388388e-05 | 264 | 1393 | 50 | 0.03589375 | 0.1893939 | GO:0046394 | GO:BP | carboxylic acid biosynthetic process | 16178 | 12522 | GO:00160…. |
In the codeblock below, we run g:profiler on the set of upregulated genes to perform a gene set enrichment analysis.
| query | significant | p_value | term_size | query_size | intersection_size | precision | recall | term_id | source | term_name | effective_domain_size | source_order | parents |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| query_1 | TRUE | 0.004049154 | 88 | 997 | 18 | 0.018054162 | 0.20454545 | GO:0042775 | GO:BP | mitochondrial ATP synthesis coupled electron transport | 16178 | 10709 | GO:0042773 |
| query_1 | TRUE | 0.004049154 | 4941 | 997 | 368 | 0.369107322 | 0.07447885 | GO:1901564 | GO:BP | organonitrogen compound metabolic process | 16178 | 22098 | GO:00068…. |
| query_1 | TRUE | 0.004049154 | 88 | 997 | 18 | 0.018054162 | 0.20454545 | GO:0042773 | GO:BP | ATP synthesis coupled electron transport | 16178 | 10707 | GO:00061…. |
| query_1 | TRUE | 0.004049154 | 8726 | 997 | 605 | 0.606820461 | 0.06933303 | GO:0044238 | GO:BP | primary metabolic process | 16178 | 11300 | GO:0008152 |
| query_1 | TRUE | 0.004049154 | 428 | 997 | 51 | 0.051153460 | 0.11915888 | GO:0007005 | GO:BP | mitochondrion organization | 16178 | 2648 | GO:0006996 |
| query_1 | TRUE | 0.004049154 | 6 | 997 | 5 | 0.005015045 | 0.83333333 | GO:0036115 | GO:BP | fatty-acyl-CoA catabolic process | 16178 | 9776 | GO:00091…. |
| query_1 | TRUE | 0.004049154 | 46 | 997 | 13 | 0.013039117 | 0.28260870 | GO:0006120 | GO:BP | mitochondrial electron transport, NADH to ubiquinone | 16178 | 1998 | GO:00196…. |
| query_1 | TRUE | 0.004049154 | 160 | 997 | 26 | 0.026078235 | 0.16250000 | GO:0009060 | GO:BP | aerobic respiration | 16178 | 3273 | GO:0045333 |
| query_1 | TRUE | 0.004244812 | 8309 | 997 | 579 | 0.580742227 | 0.06968348 | GO:0006807 | GO:BP | nitrogen compound metabolic process | 16178 | 2516 | GO:0008152 |
| query_1 | TRUE | 0.004266470 | 9757 | 997 | 666 | 0.668004012 | 0.06825869 | GO:0008152 | GO:BP | metabolic process | 16178 | 3159 | GO:0008150 |
| query_1 | TRUE | 0.004353541 | 99 | 997 | 19 | 0.019057172 | 0.19191919 | GO:0022904 | GO:BP | respiratory electron transport chain | 16178 | 6868 | GO:00229…. |
| query_1 | TRUE | 0.004460309 | 186 | 997 | 28 | 0.028084253 | 0.15053763 | GO:0045333 | GO:BP | cellular respiration | 16178 | 11755 | GO:0015980 |
| query_1 | TRUE | 0.004460309 | 490 | 997 | 55 | 0.055165496 | 0.11224490 | GO:0061919 | GO:BP | process utilizing autophagic mechanism | 16178 | 16215 | GO:0009987 |
| query_1 | TRUE | 0.004460309 | 490 | 997 | 55 | 0.055165496 | 0.11224490 | GO:0006914 | GO:BP | autophagy | 16178 | 2592 | GO:00442…. |
| query_1 | TRUE | 0.004460309 | 59 | 997 | 14 | 0.014042126 | 0.23728814 | GO:0072332 | GO:BP | intrinsic apoptotic signaling pathway by p53 class mediator | 16178 | 17869 | GO:00723…. |
| query_1 | TRUE | 0.005219212 | 61 | 997 | 14 | 0.014042126 | 0.22950820 | GO:0010257 | GO:BP | NADH dehydrogenase complex assembly | 16178 | 4072 | GO:0065003 |
| query_1 | TRUE | 0.005219212 | 61 | 997 | 14 | 0.014042126 | 0.22950820 | GO:0032981 | GO:BP | mitochondrial respiratory chain complex I assembly | 16178 | 8243 | GO:00102…. |
| query_1 | TRUE | 0.005971288 | 1168 | 997 | 107 | 0.107321966 | 0.09160959 | GO:0009057 | GO:BP | macromolecule catabolic process | 16178 | 3270 | GO:00431…. |
| query_1 | TRUE | 0.007634004 | 2145 | 997 | 176 | 0.176529589 | 0.08205128 | GO:0009056 | GO:BP | catabolic process | 16178 | 3269 | GO:0008152 |
| query_1 | TRUE | 0.007634004 | 107 | 997 | 19 | 0.019057172 | 0.17757009 | GO:0022900 | GO:BP | electron transport chain | 16178 | 6867 | GO:0006091 |
Here, we do the same for the set of downregulated genes, and perform a gene set enrichment analysis on them.
| query | significant | p_value | term_size | query_size | intersection_size | precision | recall | term_id | source | term_name | effective_domain_size | source_order | parents |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| query_1 | TRUE | 1.545313e-14 | 1494 | 396 | 94 | 0.23737374 | 0.06291834 | GO:0044281 | GO:BP | small molecule metabolic process | 16178 | 11317 | GO:0008152 |
| query_1 | TRUE | 2.470354e-14 | 764 | 396 | 63 | 0.15909091 | 0.08246073 | GO:0006082 | GO:BP | organic acid metabolic process | 16178 | 1967 | GO:00442…. |
| query_1 | TRUE | 1.688262e-13 | 758 | 396 | 61 | 0.15404040 | 0.08047493 | GO:0043436 | GO:BP | oxoacid metabolic process | 16178 | 11048 | GO:0006082 |
| query_1 | TRUE | 1.688262e-13 | 739 | 396 | 60 | 0.15151515 | 0.08119080 | GO:0019752 | GO:BP | carboxylic acid metabolic process | 16178 | 6252 | GO:0043436 |
| query_1 | TRUE | 5.379899e-11 | 299 | 396 | 34 | 0.08585859 | 0.11371237 | GO:0044282 | GO:BP | small molecule catabolic process | 16178 | 11318 | GO:00090…. |
| query_1 | TRUE | 5.379899e-11 | 24 | 396 | 12 | 0.03030303 | 0.50000000 | GO:0042730 | GO:BP | fibrinolysis | 16178 | 10679 | GO:0030195 |
Here, we present the number of gene sets returned from both the downregulated and upregulated analyses:
## [1] "How many genesets were returned from downregulated analysis?"
## [1] 5057
## [1] "How many genesets were returned from upregulated analysis"
## [1] 7844
The authors in the original paper concluded that in samples after Adoptive Cell Therapy, there was an increase in the strength and number of tumor infiltrating lymphocytes (TILSs), as well as an increase in interferon-activated myeloid T-cells (Barras 2024). This suggests that there seems to be an overall improved immune response towards the tumor after cell therapy. Upon looking at the over-representation results for down-regulated and up-regulated genes, however, there does not seem to be much correlation between the papers findings and the over-representation results. The overrepresentation results list common metabolic process such as “small molecule metabolic process” and “mitochondrial ATP synthesis coupled electron transport” as top hits, but these are not indicative of anything as they are very general/broad processes.
In this section, we begin assignment 3. The first step is to perform non-threshold Gene set Enrichment Analysis. Next, we will visualize our GSEA using Cytoscape. Lastly, we will interpret and have a more granular understanding of our results.
There are a lot of existing gene set analysis algorithm that are non-thresholded, but we will be using GSEA, as it is very popular, and well trusted throughout literature (the number of citations it has tops all other methods). We will also be using gene sets from the BaderLab.
#define path to the shell script to run the command line GSEA on
gsea_jar <- "GSEA_4.3.3/gsea-cli.sh"
# was previously set to true, but once we already have all of our results, we dont need this to be true anymore.
run_gsea <- TRUE
#name of our analysis
analysis_name <- "Pre_vs_Post_ACT"
rnk_file <- "Pre_vs_post_ACT_edger_ranks.rnk"
# obtain the rank file to use in GSEA analysis
#first sort our original gene hit list (from A2 recap) by computing the rank from the rank formula
qlf_output_hits_rnk <- data.frame(
gene = rownames(qlf_output_hits$table),
rank = -log10(qlf_output_hits$table$PValue) * sign(qlf_output_hits$table$logFC)
)
#order our rank file based on decreasing rank value
qlf_output_hits_rnk <- qlf_output_hits_rnk[order(qlf_output_hits_rnk$rank, decreasing = TRUE), ]
file_path <- "Pre_vs_post_ACT_edger_ranks.rnk"
#write the necessary rank fie out to a .rnk file, if it does not exist
if(!file.exists(file_path)){
write.table(qlf_output_hits_rnk, file = file_path, sep = "\t", quote = FALSE, row.names = FALSE, col.names = TRUE)
}
Here, we acquire the BaderLab dataset files, and first check to see if it exists.
# Here, we get all the necessary gene sets from the BaderLab
# Now we need to get the gene sets from the Bader Lab
gmt_url = "http://download.baderlab.org/EM_Genesets/current_release/Human/symbol/"
# list all the files on server
filenames = getURL(gmt_url)
tc = textConnection(filenames)
contents = readLines(tc)
close(tc)
# get gmt with all the GO pathways, and DOES NOT have terms from electronic annotations.
# for the purposes of this analysis, we will start with the gmt file that has only pathways
rx <- gregexpr("(?<=<a href=\")(.*.GOBP_AllPathways_noPFOCR_no_GO_iea.*.)(.gmt)(?=\">)",
contents,
perl = TRUE
)
gmt_file <- unlist(regmatches(contents, rx))
if(!file.exists(gmt_file)){
download.file(
paste(gmt_url,gmt_file,sep=""),
destfile=gmt_file
)
}
Here, we have set run_gsea to FALSE, as this command was already run in a previous iteration, and all the relevant files were already added to the current directory.
# here, we will set up the command for GSEA and run it (if GSEA == true)
run_gsea <- FALSE
if(run_gsea){
command <- paste("",gsea_jar,
"GSEAPreRanked -gmx", gmt_file,
"-rnk" ,rnk_file,
"-collapse false -nperm 1000 -scoring_scheme weighted",
"-rpt_label ",analysis_name,
" -plot_top_x 20 -rnd_seed 12345 -set_max 200",
" -set_min 15 -zip_report false ",
" -out" , getwd(),
" > gsea_output.txt",sep=" ")
system(command)
}
#at this point, we have completed running GSEA, and we have moved the relevant files into the working directory!
In this code block we quickly visualize our up-regulated pathways (postive normalize enrichment score- NES) and downregulated pathways (neg normalized enrichment score- NES). The up- regulated and down- regulated pathways are relative to out Post-ACT treatment group.
downregulated_pathways <- read.delim("gsea_report_for_na_neg_1712026640651.tsv")
head(downregulated_pathways)
## NAME
## 1 NEGATIVE REGULATION OF WOUND HEALING%GOBP%GO:0061045
## 2 TERPENOID METABOLIC PROCESS%GOBP%GO:0006721
## 3 CYTOCHROME P450 - ARRANGED BY SUBSTRATE TYPE%REACTOME DATABASE ID RELEASE 81%211897
## 4 DITERPENOID METABOLIC PROCESS%GOBP%GO:0016101
## 5 NEGATIVE REGULATION OF HEMOSTASIS%GOBP%GO:1900047
## 6 NEGATIVE REGULATION OF BLOOD COAGULATION%GOBP%GO:0030195
## GS.br..follow.link.to.MSigDB
## 1 NEGATIVE REGULATION OF WOUND HEALING%GOBP%GO:0061045
## 2 TERPENOID METABOLIC PROCESS%GOBP%GO:0006721
## 3 CYTOCHROME P450 - ARRANGED BY SUBSTRATE TYPE%REACTOME DATABASE ID RELEASE 81%211897
## 4 DITERPENOID METABOLIC PROCESS%GOBP%GO:0016101
## 5 NEGATIVE REGULATION OF HEMOSTASIS%GOBP%GO:1900047
## 6 NEGATIVE REGULATION OF BLOOD COAGULATION%GOBP%GO:0030195
## GS.DETAILS SIZE ES NES NOM.p.val FDR.q.val FWER.p.val
## 1 Details ... 51 -0.7474599 -2.933088 0 0 0
## 2 Details ... 58 -0.7330024 -2.871940 0 0 0
## 3 Details ... 48 -0.7501460 -2.867836 0 0 0
## 4 Details ... 52 -0.7482500 -2.864391 0 0 0
## 5 Details ... 36 -0.8118737 -2.861843 0 0 0
## 6 Details ... 36 -0.8118737 -2.858984 0 0 0
## RANK.AT.MAX LEADING.EDGE X
## 1 638 tags=29%, list=4%, signal=31% NA
## 2 1295 tags=45%, list=8%, signal=49% NA
## 3 1610 tags=50%, list=10%, signal=56% NA
## 4 1295 tags=46%, list=8%, signal=50% NA
## 5 599 tags=39%, list=4%, signal=40% NA
## 6 599 tags=39%, list=4%, signal=40% NA
upregulated_pathways <- read.delim("gsea_report_for_na_pos_1712026640651.tsv")
head(upregulated_pathways)
## NAME
## 1 OXIDATIVE PHOSPHORYLATION%GOBP%GO:0006119
## 2 MITOCHONDRIAL ATP SYNTHESIS COUPLED ELECTRON TRANSPORT%GOBP%GO:0042775
## 3 AEROBIC ELECTRON TRANSPORT CHAIN%GOBP%GO:0019646
## 4 AEROBIC RESPIRATION%GOBP%GO:0009060
## 5 AEROBIC RESPIRATION I (CYTOCHROME C)%BIOCYC%PWY-3781
## 6 ATP SYNTHESIS COUPLED ELECTRON TRANSPORT%GOBP%GO:0042773
## GS.br..follow.link.to.MSigDB
## 1 OXIDATIVE PHOSPHORYLATION%GOBP%GO:0006119
## 2 MITOCHONDRIAL ATP SYNTHESIS COUPLED ELECTRON TRANSPORT%GOBP%GO:0042775
## 3 AEROBIC ELECTRON TRANSPORT CHAIN%GOBP%GO:0019646
## 4 AEROBIC RESPIRATION%GOBP%GO:0009060
## 5 AEROBIC RESPIRATION I (CYTOCHROME C)%BIOCYC%PWY-3781
## 6 ATP SYNTHESIS COUPLED ELECTRON TRANSPORT%GOBP%GO:0042773
## GS.DETAILS SIZE ES NES NOM.p.val FDR.q.val FWER.p.val
## 1 Details ... 73 0.6527362 2.173621 0 0.0000000000 0.000
## 2 Details ... 66 0.6572567 2.165148 0 0.0000000000 0.000
## 3 Details ... 65 0.6504819 2.151371 0 0.0000000000 0.000
## 4 Details ... 102 0.6214411 2.134509 0 0.0002267553 0.001
## 5 Details ... 62 0.6590620 2.132997 0 0.0001814043 0.001
## 6 Details ... 66 0.6572567 2.132463 0 0.0001511702 0.001
## RANK.AT.MAX LEADING.EDGE X
## 1 3704 tags=71%, list=24%, signal=93% NA
## 2 3127 tags=68%, list=20%, signal=85% NA
## 3 3127 tags=66%, list=20%, signal=83% NA
## 4 3704 tags=64%, list=24%, signal=83% NA
## 5 3704 tags=76%, list=24%, signal=99% NA
## 6 3127 tags=68%, list=20%, signal=85% NA
1.What method did you use? What genesets did you use? Make sure to specify versions and cite your methods. There are a lot of existing gene set analysis algorithm that are non-thresholded, but I decided to use GSEA, as it is very popular, and well trusted throughout literature (the number of citations it has tops all other methods). I also used gene sets from the BaderLab, which can be found here. To perform the GSEA, I followed the methods outlined here
2.Summarize your enrichment results. Upon opening the index file outputted by GSEA, we can see that:
4263 / 5903 gene sets are upregulated in phenotype Post-Adoptive Cell Therapy Treatment, 1049 gene sets are significant at FDR < 25%, 445 gene sets are significantly enriched at nominal pvalue < 1% 944 gene sets are significantly enriched at nominal pvalue < 5%,
1640 / 5903 gene sets are upregulated in phenotype Pre-Adoptive Cell Therapy Treatment, 682 gene sets are significantly enriched at FDR < 25%, 441 gene sets are significantly enriched at nominal pvalue < 1%, 605 gene sets are significantly enriched at nominal pvalue < 5%
3. How do these results compare to the results from the thresholded analysis in Assignment #2. Compare qualitatively. Is this a straight forward comparison? Why or why not? From the statistics in question 2, we see that, more genes are up-regulated than down-regulated in the Post-ACT groups. This is also what we observed from our g:profiler results- the number of up-regulated genes was significantly more than the number of down-regulated genes. So in this regard, the comparison is straigntforward when comparing both the thresholded and non-thresholded analysis results. However, when performing thresholded over-representation analysis in Assignment 2, we first had to filter out the genes based on their p-values, to ensure that we only had significant up-regulated and down-regulated genes, resulting in an overall lower number of genes than GSEA. For GSEA, we end up using all of the genes, regardless of their p-value, to ensure that we dont loose any signal. So in this regard, it is not really a straightforward comparison due to the differing methods.
Our next step is to use our results from non-thresholded GSEA and visualize our results in Cytoscape. In these figures presented, a node represents a gene set, and an edge represents genes in common between two gene sets. Red nodes represent gene sets specifically up-regulated in the post-ACT group, and blue nodes represents gene sets specifically down-regulated in the post-ACT group (or up-regulated in the pre-ACT group).
1.Create an enrichment map - how many nodes and how many
edges in the resulting map? What thresholds were used to create this
map? Make sure to record all thresholds. Include a screenshot of your
network prior to manual layout. Enrichment map is created below
(with pathway annotations). There are a total of 323 nodes and 2352
edges. To generate by map, I used a node cutoff threshold of Q-value =
0.01, and for my edge cutoff, I used a threshold value of Q-value =
0.375. I used default parameters for the rest of the fields. The figure above shows the
enrichment map representing gene sets for two groups: pre-ACT patients
and post-ACT patients. Nodes in blue represent gene sets that are
down-regulated in post-ACT patients. Nodes in red represent gene sets
that are up-regulated in post-ACT patients.
2.Annotate your network - what parameters did you use to annotate the network. If you are using the default parameters make sure to list them as well. For my annotated network (figures below), I decided to annotate each gene set with its pathway name. I also decided to ensure that there are not any overlaps in cluster names, to ensure a clear image and no overlapping words. I also added a figure legend to both the images below.
3.Make a publication ready figure - include this figure with proper legends in your notebook. My publication ready figures, each with a figure legend are presented in question 2.
4.Collapse your network to a theme network. What are the major themes present in this analysis? Do they fit with the model? Are there any novel pathways or themes? My images above from question 2, are collapsed into “themes”/clusters. From the images presented in question 2, we see that there is a lot of pathways associated with processes: “ximelagatran action pathway”, “valdecoxcib action pathway”, “mediated fceri activation”, and “respiratory electron transport”. There are other themes present, but these are just the largest ones. There is not really a clear indication that these themes fit with my model- the authors from the original paper concluded that in samples post-ACT, there was an increase in the strength and number of tumor infiltrating lymphocytes (TILSs), as well as an increase in interferon-activated myeloid T-cells (Barras 2024), and from the themes it seems that valdecoxib action pathways, which treat inflammation and pain are down-regulated in the post-ACT group, whereas pathways related to fceri activation (which cause the release of inflammatory mediators such as histamines) were up-regulated in the post-ACT group. I am not able to find a reason for why there would be increased inflammation in post-ACT patients. I hypothesize that it might have something to do with inflammatory responses activated due to the cell therapy. Also, from my understanding, there are no novel pathways or themes present.
1.Do the enrichment results support conclusions or mechanism discussed in the original paper? How do these results differ from the results you got from Assignment #2 thresholded methods I was not able to find any type of correlation between the conclusions discussed in the original paper and the enrichment results from GSEA. As I mentioned above, the original paper concluded that in samples post Adoptive Cell Therapy (ACT), there was an increase in the strength and number of tumor infiltrating lymphocytes (TILSs), as well as an increase in interferon-activated myeloid T-cells (Barras 2024). This means that post-ACT treatment, the bodies immune system has become stronger. The enrichment map shows a predominant cluster of pathways that are down regulated in post-ACT patients under the theme “valdecoxib action pathways”, and upon doing some literature search, I found that valdecoxib is essentially a drug that prevents inflammation and pain, but was not sure why it would be down-regulated in post-ACT patients. In addition, one of the other major themes that were up-regulated in post-ACT patients, present in the enrichment map was “fceri mediated activation”, which caused the release of inflammatory mediators such as histamine. I was also not able to find any literature as to why these pathways would be upregulated in post-ACT patients.
In regards to comparing these results with the results from assignment 2’s thresholded g:profiler methods, there was not much of a correlation. Common pathways presented in A2 were apoptotic pathways and molecular metabolic processes, neither of which really correlate with the big themes presented in the GSEA enrichment map.
2.Can you find evidence, i.e. publications, to support some of the results that you see. How does this evidence support your result? As I mentioned above, fceri mediated activation was a major theme presented in my enrichment map above. However, I was not able to find much evidence as to why these pathways were up-regulated in post-ACT patients.
The pathway that I chose to investigate in more detail was the FCERI MEDIATED NF-KB ACTIVATION pathway. I decided to choose this pathway as it is found in one of the most prevalent themes in our network above, and is up-regulated in post-ACT patients. Thus, I wanted to investigate it in a bit more detail, to see if I can find any information and insights as to why pathways that mediate release of inflammatory molecules are up-regulated. Below, I used STRING to generate the pathway as a gene network. I was unable to, however annotate the network or pathway with my log fold expression values and p-values, because every time I tried to import my rank file, and select gene names as the parameter, Cytoscape wouldn’t generate the correct output. I have explained this issue in my journal as well.
Morgan M, Ramos M (2023). BiocManager: Access the Bioconductor Project Package Repository. R package version 1.30.22, https://CRAN.R-project.org/package=BiocManager.
Davis, S. and Meltzer, P. S. GEOquery: a bridge between the Gene Expression Omnibus (GEO) and BioConductor. Bioinformatics, 2007, 14, 1846-1847
Xie Y (2023). knitr: A General-Purpose Package for Dynamic Report Generation in R. R package version 1.45, https://yihui.org/knitr/.
Robinson MD, McCarthy DJ and Smyth GK (2010). edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26, 139-140
Response to tumor-infiltrating lymphocyte adoptive therapy is associated with preexisting CD8+ T-myeloid cell … (n.d.). https://www.science.org/doi/10.1126/sciimmunol.adg7995
Gu Z, Eils R, Schlesner M (2016). “Complex heatmaps reveal patterns and correlations in multidimensional genomic data.” Bioinformatics. doi:10.1093/bioinformatics/btw313
Gu Z (2022). “Complex Heatmap Visualization.” iMeta. doi:10.1002/imt2.43.
Kolberg L, Raudvere U, Kuzmin I, Vilo J, Peterson H (2020).
“gprofiler2– an R package for gene list functional enrichment analysis
and namespace conversion toolset
g:Profiler.” F1000Research, 9 (ELIXIR)(709). R package version
0.2.3.
Seidel, C. S. C. (n.d.). Intro to Edger. https://research.stowers.org/cws/CompGenomics/Projects/edgeR.html
Shi D, Jiang P. A Different Facet of p53 Function: Regulation of Immunity and Inflammation During Tumor Development. Front Cell Dev Biol. 2021 Oct 18;9:762651. doi:10.3389/fcell.2021.762651. PMID: 34733856; PMCID: PMC8558413.
Agrawal a, et al. (2024) WikiPathways 2024: next generation pathway database. NAR.
Milacic M, Beavers D, Conley P, Gong C, Gillespie M, Griss J, Haw R, Jassal B, Matthews L, May B, Petryszak R, Ragueneau E, Rothfels K, Sevilla C, Shamovsky V, Stephan R, Tiwari K, Varusai T, Weiser J, Wright A, Wu G, Stein L, Hermjakob H, D’Eustachio P. The Reactome Pathway Knowledgebase 2024. Nucleic Acids Research. 2024. doi: 10.1093/nar/gkad1025.
Ashburner et al. Gene ontology: tool for the unification of biology. Nat Genet. 2000 May;25(1):25-9. DOI: 10.1038/75556
Cytoscape App Store - enrichmentmap. (n.d.). https://apps.cytoscape.org/apps/enrichmentmap
Gary Bader, R. I. (n.d.). Pathway and network analysis of -OMICS data ( June 2023 ). Module 3 Lab: GSEA Visualization.https://baderlab.github.io/CBW_Pathways_2023/gsea_mod3.html
“Valdecoxib (Oral Route) Description and Brand Names.” Mayo Clinic, Mayo Foundation for Medical Education and Research, 1 Feb. 2024, www.mayoclinic.org/drugs-supplements/valdecoxib-oral-route/description/drg-20066654.
“FC Epsilon Receptor (FCERI) Signaling.” National Center for Biotechnology Information. PubChem Compound Database, U.S. National Library of Medicine, pubchem.ncbi.nlm.nih.gov/pathway/Reactome:R-HSA-2454202.Accessed 3 Apr. 2024.